Final Project Guidelines

0.1 Your Task

For this project, you are expected to use a two-way ANOVA to investigate the relationship between one numerical response variable and two categorical explanatory variables. You are permitted to use the same dataset as your Midterm Project, so long as there are at least two categorical variables to choose from.

Note

If you would like to analyze a discrete numerical variable (e.g., number of pets) as a categorical variable, you will need to convert that variable into a categorical variable in R as R assumes all variables with numbers should be numerical.

You will need Dr. Theobold’s help to perform this task. Dr. Theobold will work with you to convert your variable as long as you request help before Friday at 4pm.

1 Introduction

1.1 Data Description

In 4-6 sentences describe:

  • how the data were collected
  • the context of the data (e.g., are the data from from a published study?)
  • the background of the research problem (e.g., why were the data collected?)

1.2 Questions of Interest

State the question(s) of interest you will address with your statistical analysis. The more specific you define the question of interest here, the easier the rest of the analysis and report will be. The research questions should start with, “What is the relationship between…” and should be as specific as possible. Your Findings section should directly address the question(s) you pose here.

Multiple research questions

This week you are starting with research questions are appropriate for the one-way ANOVA models you are fitting. However, you may find that you need to add research questions based on the models you fit in Week 10!

2 Methods

This section should lay out the steps, decisions, and logic leading to the statistical model you will use to answer the research question of interest.

  • Describe the response and explanatory variables, how they were measured and their associated units. For categorical variables, describe the levels of the categorical variable.

  • Produce data visualizations exploring the relationship(s) you are interested in investigating, contrasting the need for a second explanatory variable. For your project everyone will have three visualizations:

    • a visualization of the relationship between your response variable and explanatory variable 1
    • a visualization of the relationship between your response variable and explanatory variable 2
    • a visualization of the relationship between your response variable and both explanatory variables
Visualizations with categorical variables

In Lab 3 you learned how to make density ridge plots, the most recommended visualization for numerical and categorical variables. For your project you are required to use density ridge plots. If you are unsure how to accomplish this task, look back over Lab 3 and the Week 3 R resources.

Every visualization should have nicely formatted axis labels!

  • Describe what you see in the visualizations, making direct references to the plots!

  • Outline the statistical model you will use to answer the question(s) of interest that you stated previously.

Not visually selecting a model

Unlike the Midterm Project, you will not be choosing what statistical model to fit based on the visualizations. Everyone will be starting with two one-way ANOVA models to analyze their data!

  • Evaluate the conditions of the statistical model you propose to use
Condition violations

If you find through the study design and / or your visualizations that certain model conditions are violated, you are expected to do your best to remedy these violations. If you need help figuring out how to do this, email Dr. Theobold!

3 Findings

In this section you will write up your findings for your question of interest. I’ve created an example of how this section could look in this document.

Caution

You must state what \(\alpha\) threshold was used when reaching your hypothesis test decisions.

3.1 One-Way ANOVA Model

Everyone starts here! In this section, you need to do the following:

  • Fit a one-way ANOVA model
  • Obtain the ANOVA table for the model
  • Based on the ANOVA table, state what decision was reached for the hypothesis in the one-way ANOVA model
  • Based on the decision you made, state what you can conclude regarding the relationship between your variables

3.2 Conclusions

Based on the results of your analysis what is your conclusion for the questions of interest? Connect your conclusion(s) to the relationships you saw in the visualizations you made.

In this section you should also describe whether you believe the tests you performed are “reliable”. Meaning, did you violate any of the conditions required of a one-way ANOVA model?

4 Scope of Inference

Write a 4-5 sentence statement on what can be inferred from the design of the study and the results of your statistical analysis. Specifically, answer these two questions and comment on their implications:

  • Based on the sampling method used, what larger population can you infer the results or your analysis onto?
Tip

Your statement needs to include a description of (1) how the data were collected, and (2) the population to whom the results can be applied. You must justify your reasoning for #2 using information from the design of the study.

  • Based on the design of the study, what type of statements can be made about the relationship between the explanatory and response variables?
Tip

Your statement needs to include a description of (1) how the study was designed, and (2) what statements can be made about the relationship between the variables. You must justify your reasoning for #2 by making direct reference to the variables included in the study. General statements about “this was an observational study” are insufficient.